{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Further Hypothesis Testing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select this cell and type Ctrl-Enter to execute the code below.\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from scipy import stats\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "data = pd.read_csv(\"stars.csv\")\n",
    "type_key = ['Brown Dwarf', 'Red Dwarf', 'White Dwarf', 'Main Sequence', 'Supergiant','Hypergiant']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2 - Comparing means of two groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For now, let's look at only types 4 and 5 (supergiant and hypergiant). These are of particular interest to your supervisor, Dr Howe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "types = [4,5]\n",
    "\n",
    "sample = data.luminosity.apply(np.log)\n",
    "grouped = sample.groupby(data.type)\n",
    "\n",
    "xlab = 'log(luminosity)'\n",
    "ylab = 'freq'\n",
    "\n",
    "displayed = pd.concat([grouped.get_group(t) for t in types])\n",
    "bins = np.linspace(displayed.min(), displayed.max(), 20)\n",
    "ax = plt.axes()\n",
    "ax.set_xlabel(xlab)\n",
    "ax.set_ylabel(ylab)\n",
    "\n",
    "plt.hist(grouped.get_group(types[0]), bins, alpha=0.5, label=type_key[types[0]], color='C' + str(types[0]))\n",
    "plt.hist(grouped.get_group(types[1]), bins, alpha=0.5, label=type_key[types[1]], color='C' + str(types[1]))\n",
    "\n",
    "ax.legend(loc='upper left')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dr Howe has noticed that supergiants and hypergiants seem to have very similar luminosity distributions. She asks you to check whether they have the same mean."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question: do types 4 and 5 have the same mean luminosity?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The sample means of log(luminosity) are easy to obtain:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type4 = data[data.type == 4].luminosity.apply(np.log)\n",
    "type5 = data[data.type == 5].luminosity.apply(np.log)\n",
    "\n",
    "mean4 = type4.mean()\n",
    "mean5 = type5.mean()\n",
    "\n",
    "print('Type 4:', mean4)\n",
    "print('Type 5:', mean5)\n",
    "print('difference:', mean4 - mean5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "They are certainly very similar, but is the difference between them statistically significant?\n",
    "\n",
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the histogram, both distributions of log(luminosity) seem approximately symmetrical and with a rough bell-curve, so for now we will assume that they are normally distributed. (We will look later at how to test for normality.)\n",
    "\n",
    "We can therefore choose a **parametric test** for the difference between two means. This means that the test uses a defined probability distribution (e.g. the normal distribution) as a model for the process that generates the data.\n",
    "\n",
    "<br>\n",
    "\n",
    "In general, if the assumptions of a parametric test are satisfied then it will provide more **statistical power** than a non-parametric alternative. Statistical power is defined as the probability that the test *correctly rejects the null hypothesis when it is false*, also known as its *sensitivity* or *true positive rate*.\n",
    "\n",
    "Different parametric test make different **assumptions** about the data, so it is important to think carefully about whether these are satisfied before deciding on a particular test.\n",
    "\n",
    "\n",
    "In this example, a [*t-test*](https://en.wikipedia.org/wiki/Student%27s_t-test) is appropriate:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### t-test for 2 independent groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Theory\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When comparing two samples (1 and 2), we will refer to their sizes as $n_1$ and $n_2$, their sample means as $\\bar{x}_1$ and $\\bar{x}_2$ and their sample standard deviations as $s_1$ and $s_2$.\n",
    "\n",
    "Recall that \n",
    "\n",
    "$$\\bar{x} = \\frac{\\sum_{i=1}^n x_i}{n}$$\n",
    "\n",
    "is the *sample mean*\n",
    "\n",
    "and \n",
    "\n",
    "$$s^2 = \\frac{\\sum_{i=1}^n (x_i - \\bar{x})^2}{n-1}$$\n",
    "\n",
    "is the *unbiased sample variance*.\n",
    "\n",
    "<br>\n",
    "\n",
    "For our example, we need a two-tailed test:\n",
    "\n",
    "$H_0$: The two samples come from the same distribution with mean $\\mu = \\mu_1 = \\mu_2$.\n",
    "\n",
    "$H_1$: The samples come from two different distributions, with means $\\mu_1 \\ne \\mu_2$.\n",
    "\n",
    "<br>\n",
    "\n",
    "The test statistic is given by\n",
    "\n",
    "$$t = \\frac{\\bar {x}_1 - \\bar{x}_2}{s_p \\cdot \\sqrt{\\frac{1}{n_1}+\\frac{1}{n_2}}}$$,\n",
    "\n",
    "where \n",
    "\n",
    "$$s_p^2 = \\frac{\\left(n_1-1\\right)s_1^2 + \\left(n_2-1\\right)s_2^2}{n_1 + n_2-2}$$ \n",
    "\n",
    "is an unbiased estimator of the *pooled variance* of the two samples.\n",
    "\n",
    "<br>\n",
    "\n",
    "Under $H_0$, the test statistic $t$ follows a *Student's t-distribution* with $n_1 + n_2 - 2$ degrees of freedom.\n",
    "\n",
    "We use this distribution to calculate a p-value for the observed value of the test statistic, $t$.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Assumptions\n",
    "\n",
    "- The means of the two samples follow normal distributions. This is true if the samples themselves are normal, but also true for any other distribution if $n$ is large (by the *central limit theorem*).\n",
    "- The two populations have equal variance.\n",
    "- The two samples are independent.\n",
    "\n",
    "Two-sample t-tests are robust to moderate deviations from these assumptions, but major deviations may produce misleading results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Application\n",
    "\n",
    "$H_0$: $\\mu_{\\text{type4}} = \\mu_{\\text{type5}}$.\n",
    "\n",
    "$H_1$: $\\mu_{\\text{type4}} \\ne \\mu_{\\text{type5}}$.\n",
    "\n",
    "Let's set a significance level $\\alpha=0.05$\n",
    "\n",
    "Using a statistical library such as `scipy.stats`, we just supply the data for each sample and the test function deals with the rest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.ttest_ind(type4, type5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can visualise this result on the t-distribution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_obs = stats.ttest_ind(type4, type5).statistic\n",
    "df = len(type4) + len(type5) - 2\n",
    "print(\"degrees of freedom:\", df)\n",
    "\n",
    "tmin = -4\n",
    "tmax = 4\n",
    "\n",
    "t = stats.t(df)\n",
    "x = np.linspace(tmin,tmax,100)\n",
    "ax = plt.axes()\n",
    "ax.set_xlabel(\"t\")\n",
    "ax.set_ylabel(\"pdf\")\n",
    "plt.plot(x, t.pdf(x), color='gray')\n",
    "\n",
    "# the area of the shaded region is the two-tailed p-value\n",
    "lower_tail = np.linspace(tmin,-t_obs,100)\n",
    "upper_tail = np.linspace(t_obs,tmax,100)\n",
    "plt.fill_between(lower_tail,t.pdf(lower_tail),color='lightgrey')\n",
    "plt.fill_between(upper_tail,t.pdf(upper_tail),color='lightgrey')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The t-test p-value is greater than than $\\alpha$, so we accept the null hypothesis that the means are equal."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Other types of t-test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### One-tailed t-test\n",
    "\n",
    "In the example above, we used a *two-tailed test* (because $H_1:\\mu_1 \\ne \\mu_2$ was symmetrical). \n",
    "\n",
    "For a *one-tailed test*, we need to halve the two-sided p-value, e.g. \n",
    "\n",
    "$H_1:\\mu_{\\text{type4}}>\\mu_{\\text{type5}}$ would give us "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax = plt.axes()\n",
    "ax.set_xlabel(\"t\")\n",
    "ax.set_ylabel(\"pdf\")\n",
    "plt.plot(x, t.pdf(x), color='gray')\n",
    "\n",
    "# the area of the shaded region is the one-tailed p-value for H1: mu_1 > mu_2\n",
    "upper_tail = np.linspace(t_obs,tmax,100)\n",
    "plt.fill_between(upper_tail,t.pdf(upper_tail),color='lightgrey')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_2tailed = stats.ttest_ind(type4, type5).pvalue\n",
    "p_1tailed = p_2tailed/2\n",
    "print('1-tailed p-value:', p_1tailed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To test the complementary hypothesis (i.e. $H_1:\\mu_1<\\mu_2$), we would need to use `1 - p_1tailed`, e.g. \n",
    "\n",
    "$H_1:\\mu_{\\text{type4}}<\\mu_{\\text{type5}}$ would give us"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax = plt.axes()\n",
    "ax.set_xlabel(\"t\")\n",
    "ax.set_ylabel(\"pdf\")\n",
    "plt.plot(x, t.pdf(x), color='gray')\n",
    "\n",
    "# the area of the shaded region is the one-tailed p-value for H1: mu_1 < mu_2\n",
    "lower_tail = np.linspace(tmin,t_obs,100)\n",
    "plt.fill_between(lower_tail,t.pdf(lower_tail),color='lightgrey')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_2tailed = stats.ttest_ind(type5, type4).pvalue\n",
    "p_1tailed = p_2tailed/2\n",
    "print('1-tailed p-value:', 1 - p_1tailed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "#### Paired two-sample t-test\n",
    "\n",
    "Sometimes we have two samples with paired observations (for example, luminosity of the same set of stars, as measured on two different dates). This situation requires testing whether the *mean of the differences* between pairs is zero, which is called a [*paired two-sample t-test*](https://en.wikipedia.org/wiki/Student%27s_t-test#Dependent_t-test_for_paired_samples).\n",
    "\n",
    "#### Welch's t-test\n",
    "\n",
    "If the sample sizes in the two groups being compared are equal, Student's original t-test is highly robust to the presence of unequal variances. [*Welch's t-test*](https://en.wikipedia.org/wiki/Welch%27s_t-test) is an alternative that is insensitive to equality of the variances, regardless of whether the sample sizes are similar.\n",
    "\n",
    "#### One-sample t-test\n",
    "\n",
    "For cases where we want to compare a sample against a theoretical mean, we can use the [*one-sample t-test*](https://en.wikipedia.org/wiki/Student%27s_t-test#One-sample_t-test)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Alternatives to the t-test\n",
    "\n",
    "#### Mann-Whitney U-test\n",
    "\n",
    "For non-normal samples where $n$ is small, the assumptions of the t-test break down. However, we can use a *non-parametric test* to compare two samples, whatever the shape of their distributions.\n",
    "\n",
    "The [*Mann-Whitney U-test*](https://en.wikipedia.org/wiki/Mann–Whitney_U_test) (aka Wilcoxon rank-sum test) is one such test. The null hypothesis for this test is that a randomly selected value from sample 1 is equally likely to be less than or greater than a randomly selected value from sample 2. If the distributions are sufficiently different, the resulting p-value will be small and we will reject this null hypothesis. Note that the U-test does not compare the sample means directly.\n",
    "\n",
    "\n",
    "#### Wilcoxon signed-rank test\n",
    "\n",
    "The [*Wilcoxon signed-rank test*](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is is the paired-sample version of the Mann-Whitney U-test.\n",
    "\n",
    "<br>\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
